[CALCITE-7620] Result of FILTER clause in window functions is incorrect by xuzifu666 · Pull Request #5040 · apache/calcite

xuzifu666 · 2026-06-23T04:11:40Z

jira: https://issues.apache.org/jira/browse/CALCITE-7620

xuzifu666 · 2026-06-23T04:36:10Z

+   * partition by ORDER BY keys. The output order is therefore not defined by
+   * a simple collation in the general case, so we conservatively report no
+   * collations. */
  public static @Nullable List<RelCollation> window(RelMetadataQuery mq, RelNode input,


The reason for modifying RelMdCollation.window is that the original window sorting derivation was too optimistic, which would cause the optimizer to mistakenly believe that the window output retained the input order, thus mistakenly deleting the top-level Sort.
The original implementation had the following problem:

Previously, RelMdCollation.window directly returned mq.collations(input), meaning "the window operator will preserve the order of the input rows as is." However, the actual implementation of EnumerableWindow first groups the rows by the PARTITION BY key using SortedMultiMap, and then sorts them within each group by the window ORDER BY key. Therefore, the input order is not preserved; the global output order is PARTITION BY keys + ORDER BY keys, not simply the input order.
This caused the top-level Sort to be incorrectly optimized away.

When order by empno is written in the SQL, if the window also happens to be sorted by empno, the optimizer will mistakenly assume that the window output is globally ordered, thus deleting the top-level EnumerableSort. The resulting output is grouped by deptno, not sorted by empno.

xuzifu666 · 2026-06-23T04:38:31Z

    this.hints = calc.getHints();
    this.cluster = calc.getCluster();
-    this.traits = calc.getTraitSet();
+    this.traits = calc.getTraitSet()


The reason for modifying CalcRelSplitter.java is that when ProjectToWindowRule splits Calc/Project containing window functions, it passes the original node's trait set (including the contaminated collation) to the new node after splitting, causing the optimizer to incorrectly remove the top-level Sort before the window expands.

iwanttobepowerful · 2026-06-23T06:15:29Z

 !ok

 # Test 3: Multiple FILTER with OVER on different aggregates
 select empno, deptno,


another case

select ename, job, hiredate, avg(sal) over (order by hiredate rows 3 preceding) as avg_sal, avg(sal) filter (where job = 'MANAGER') over (order by hiredate rows 3 preceding) as avg_mgr_sal from emp order by hiredate;

OK, this test has been added; the AVG_MGR_SAL field is related to this filter modification, and the data is consistent with https://onecompiler.com/postgresql/44smkpfxb

xuzifu666 · 2026-06-24T03:31:08Z

Do you happy with current changes? Because this issue affects the data quality of the filter, want to fix it as soon as possible so that this capability is usable in production. PTAL~ @iwanttobepowerful

iwanttobepowerful · 2026-06-24T03:55:59Z

-        EnumerableWindow(window#0=[window(partition {0} rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])], constants=[[false]])
-          EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
-            EnumerableTableScan(table=[[scott, DEPT]])
+    EnumerableCalc(expr#0..2=[{inputs}], expr#3=[1], expr#4=[<=($t2, $t3)], proj#0..2=[{exprs}], $condition=[$t4])


Curious about the motivation behind this update?

This fix altered the sorting metadata derivation for the window operator, resulting in a legitimate change to the execution plan structure of two existing !plan tests, but the computation results remain unchanged.

Specific changes: Previously, the collation derivation for EnumerableWindow was optimistic, claiming to retain the input sorting. With CalcRelSplitter inheriting the original Calc's collation, the optimizer would push Sort down onto the Window and merge it with the outer Calc. Therefore, the original plan was:

EnumerableSort EnumerableCalc(condition + window output) EnumerableWindow

After the fix, EnumerableWindow no longer claims any sorting, and CalcRelSplitter no longer inherits the contaminated collation. Sort can only stop at the Window, wrapped in an outer Calc for projection. The plan becomes:

EnumerableCalc(projection) EnumerableSort EnumerableCalc(condition) EnumerableWindow

Both are logically equivalent; only the relative positions of Sort and the projected Calc have changed. Therefore, only the expected output of these two !plan blocks was updated, without modifying any result data.This also ensures complete consistency with the PostgreSQL sorting results.

The plan generated by the new correlation algorithm looks much better.

!use scott !set outputformat mysql select sal from emp e where 123 NOT IN ( select cast(null as int) from dept d where e.deptno=d.deptno); +-----+ | SAL | +-----+ +-----+ (0 rows) !ok EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t3)], SAL=[$t1], $condition=[$t4]) EnumerableHashJoin(condition=[AND(IS NOT DISTINCT FROM($2, $4), =(123, $3))], joinType=[left_mark]) EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7]) EnumerableTableScan(table=[[scott, EMP]]) EnumerableCalc(expr#0..2=[{inputs}], expr#3=[null:INTEGER], $f0=[$t3], DEPTNO=[$t0]) EnumerableTableScan(table=[[scott, DEPT]]) !plan

The plan generated by the new correlation algorithm looks much better.

!use scott !set outputformat mysql select sal from emp e where 123 NOT IN ( select cast(null as int) from dept d where e.deptno=d.deptno); +-----+ | SAL | +-----+ +-----+ (0 rows) !ok EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t3)], SAL=[$t1], $condition=[$t4]) EnumerableHashJoin(condition=[AND(IS NOT DISTINCT FROM($2, $4), =(123, $3))], joinType=[left_mark]) EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5], DEPTNO=[$t7]) EnumerableTableScan(table=[[scott, EMP]]) EnumerableCalc(expr#0..2=[{inputs}], expr#3=[null:INTEGER], $f0=[$t3], DEPTNO=[$t0]) EnumerableTableScan(table=[[scott, DEPT]]) !plan

Yes, this explain is more fitable !

xiedeyantu · 2026-06-24T12:52:17Z

    this.cluster = calc.getCluster();
-    this.traits = calc.getTraitSet();
+    this.traits = calc.getTraitSet()
+        .replaceIfs(RelCollationTraitDef.INSTANCE, Collections::emptyList);


The collation reset should be scoped to the LogicalWindow creation rather than applied in CalcRelSplitter’s constructor. The ordering issue is specific to window nodes: a Window does not preserve or expose a simple output collation, so it should not inherit the original Calc collation trait.
Suggested change:
ProjectToWindowRule.java:260

@Override protected RelNode makeRel(RelOptCluster cluster, RelTraitSet traitSet, RelBuilder relBuilder, RelNode input, RexProgram program, List<RelHint> hints) { checkArgument(program.getCondition() == null, "WindowedAggregateRel cannot accept a condition"); + traitSet = traitSet.replaceIfs(RelCollationTraitDef.INSTANCE, + Collections::emptyList); return LogicalWindow.create(cluster, traitSet, relBuilder, input, program, hints); }

This keeps the generic splitter behavior unchanged and makes the intent local to the window-producing RelType.

Make sense, done.

xiedeyantu · 2026-06-24T12:54:53Z

    SqlCall call = (SqlCall) node;
    bb.getValidator().deriveType(bb.scope, call);
    SqlCall aggCall = call.operand(0);
+    @Nullable SqlNode filter = null;


I rarely see the need for @nullable in code.

This is to satisfy the checkframe null-safety check; otherwise, the calling method would need to be modified, whereas here, simply adding an annotation resolves the issue.

xiedeyantu · 2026-06-24T13:55:18Z

+    Util.discard(mq);
+    Util.discard(input);
+    Util.discard(groups);
+    return Collections.emptyList();


Suggested change

Util.discard(mq);

Util.discard(input);

Util.discard(groups);

return Collections.emptyList();

return Collections.emptyList();

Done, before this just to handle with unused warning.

xiedeyantu · 2026-06-24T13:57:15Z

+   * Applies a FILTER clause to the arguments of an aggregate call by wrapping
+   * each argument in a CASE expression. For example,
+   * {@code SUM(sal) FILTER (WHERE comm IS NOT NULL)} becomes
+   * {@code SUM(CASE WHEN comm IS NOT NULL THEN sal END)}.


I'm unsure if the semantics of functions like FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD, and LAG are correct after the rewrite.

Your concern is valid. The rewrite is only semantically correct for true aggregate functions like SUM, COUNT, AVG, MIN, and MAX, because those functions ignore NULL inputs.

However filter does not hold for window value functions such as FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD, and LAG. such as sql in postgresql would error out：

SELECT ename, job, FIRST_VALUE(sal) FILTER (WHERE job = 'MANAGER') OVER (ORDER BY hiredate, ename) AS first_mgr_sal FROM emp ORDER BY hiredate, ename;

psql:commands.sql:81: ERROR: FILTER is not implemented for non-aggregate window functions LINE 4: FIRST_VALUE(sal) FILTER (WHERE job = 'MANAGER') OVER (ORDE...

Therefore, based on the tests conducted so far, I believe there are no issues.

Could you add a related test?

xiedeyantu

LGTM

sonarqubecloud · 2026-06-25T03:58:55Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

xuzifu666 · 2026-06-25T07:18:57Z

If there no other comments, I would merge this pr soon. cc @iwanttobepowerful

iwanttobepowerful · 2026-06-25T07:38:04Z

Sorry, I’m not too familiar with this area.

xuzifu666 · 2026-06-25T07:50:46Z

Sorry, I’m not too familiar with this area.

Alright, the current issues should all be addressed. Let's see if there are any other suggestions; if not, I'll merge this in 48 hours. This is a fairly urgent matter—fixing data quality issues.

xuzifu666 commented Jun 23, 2026

View reviewed changes

iwanttobepowerful reviewed Jun 23, 2026

View reviewed changes

xuzifu666 requested a review from iwanttobepowerful June 24, 2026 03:31

iwanttobepowerful reviewed Jun 24, 2026

View reviewed changes

iwanttobepowerful requested a review from xiedeyantu June 24, 2026 04:05

xiedeyantu reviewed Jun 24, 2026

View reviewed changes

xiedeyantu approved these changes Jun 25, 2026

View reviewed changes

[CALCITE-7620] Result of FILTER clause in window functions is incorrect

73b4fc7

xuzifu666 force-pushed the filter_fix branch from 080d0f1 to 73b4fc7 Compare June 25, 2026 07:52

xuzifu666 added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jun 25, 2026

xuzifu666 merged commit 9abd95b into apache:main Jun 26, 2026
32 checks passed

Uh oh!

Conversation

xuzifu666 commented Jun 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuzifu666 Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuzifu666 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiedeyantu left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 25, 2026

Quality Gate passed

Uh oh!

xuzifu666 commented Jun 25, 2026

Uh oh!

iwanttobepowerful commented Jun 25, 2026

Uh oh!

xuzifu666 commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xuzifu666 Jun 23, 2026 •

edited

Loading

xuzifu666 commented Jun 24, 2026 •

edited

Loading